1 General information

1.1 Welcome Email

Dear students,

A warm welcome to the module Data skills for social work professionals!

I would like to give you some important information on the course:

  1. Physical presence in the course is not mandatory except for the last day (Jan 17), but I strongly urge you to participate in the course during the first four days. For those who do work part-time: please schedule your work accordingly.

  2. I it is imperative that you have first experiences with R and RStudio and make sure it runs on your computer. Please follow the instructions in the “Installation of R and R-Studio” guide (https://drive.switch.ch/index.php/s/ktNsnWxwkJ3olWG), and if necessary, refer to the linked instructions on YouTube. If you have any questions, please feel free to contact us via email. Please use Copilot with the prompt below to guide you through the installation, explain the software to you in easy language and show you what you can do with it.

  3. Enroll on the moodle page (Kurs: Data skills for social work professionals (in English) - HS24 | BFH Moodle – Die Lernplattform der Berner Fachhochschule) with the following key: HS24-bsc. At least a week before the course you will find a link to the script of the course as well as the relevant literature that you need to prepare and other relevant information. We wish you a successful preparation period and look forward to meeting you in person soon. Please let us know should you have any questions.

Kind regards

Dorian Kessler


Text to enter into Co-Pilot ein (Microsoft Copilot in Bing; important: verwenden Sie den Unterhaltungsstil «im höheren Masse kreativ/creative mode» (Schaltfläche in der Mitte des Bildschirms)): Als Studierende(r) der Sozialen Arbeit möchte ich die Grundlagen der Programmiersprache R lernen, um statistische Datenanalysen für Projekte in der Sozialen Arbeit durchführen zu können. Ich habe keine Vorkenntnisse in Statistik oder Programmierung. Kannst du mir bitte eine schrittweise Einführung geben? Bitte beginne mit der Frage ob ich R und Rstudio installiert habe und wenn nein, unterstütze mich bei der Installation von R und RStudio. Zeige mir dann die grundlegenden Befehle und Funktionen von R. Ich würde ich gerne lernen, wie man einfache Datenanalysen durchführt (z.B. Mittelwertsvergleiche mit dplyr), Daten visualisiert (mit ggplot2) und Ergebnisse interpretiert. Folgende Dinge sind zu beachten:

  • Wähle ein schrittweises Vorgehen. Erzähle mir erst von dem nächsten Schritt, wenn ein Schritt abgeschlossen ist. Frage nach jedem Schritt nach, ob ich diesen erfolgreich abschliessen konnte, um sicherzustellen, dass ich alles richtig gemacht habe.

  • Sage mir als ersten Schritt genau wie ich mich visuell in RStudio orientieren kann und wo ich Eingaben machen muss. Wo befindet sich die Konsole/Skript/Datenübersicht/Dateienübersicht in RStudio?

  • Erkläre mir, was die Konsole ist und was ein R-Skript ist, wie man ein R-Skript erstellt und abspeichert und was der Zweck von Skripten ist. Arbeite mit mir mit einem R-Skript und sage mir, wie ich Befehle ausführen kann.

  • Bitte führe mich durch praktische Übungen und gebe mir Aufgaben, um das Gelernte zu festigen.

  • Biete mir Unterstützung bei Unklarheiten.

  • Arbeite mit Beispielen, welche für die Soziale Arbeit relevant sind. Erfinde relevante Daten aus den Bereichen Sozialhilfe oder Kindes- und Erwachsenenschutz.

  • Kommentiere den Code Zeile-für-Zeile detailliert aus, so dass ich ihn genau verstehe.

  • Biete mir am Schluss weitere Übungen an, falls ich Lust habe. Mache Vorschläge für Übungen.

  • Du bist eine R-Expert:in, weisst aber auch, dass angehende Sozialarbeiter:in in Sachen Programmierung wenig Wissen haben und das nicht technische Begriffe eine alltagssprachliche Erklärung benötigen.

  • Danke für deine motivierte Unterstützung und Hilfsbereitschaft! Du hilfst mir R zu lernen und dieses Wissen für Klient:innen einzusetzen.

  • Wichtige Details:

  • Bitte lasse das «print()» weg, falls nicht nötig.

  • Ergänze bei Strg jeweils Ctrl, falls gewisse Personen englische Windows Tastaturen haben.

2 General Introduction

2.1 Learning Goals

  • People gain awareness of data science tools and how they could be used for social work.

  • People learn how to critically evaluate data science products

  • People learn how to do data science with R.

2.2 What is data science?

  • Term that emerged ca. 10 years ago. Predecessors: Statistics, Data analysis.

  • The science of creating valuable information from data

  • Practice-oriented science

  • Combines technical and field expertise

2.3 Datafication or why data science is becoming more important in the future

  • Data is the new oil.

  • Data contains information on human behavior = helps us better understand the human world and solve human problems.

  • In the era of AI, “data literacy” becomes a key skill in all areas of life, including social work –> it should be a basic competence

    • Skills to interpret data

    • Awareness of data and knowing how to use them

    • Skills to analyze data

2.4 How can data science benefit social work?

2.5 Data sources that are relevant for social work

2.5.3 Found data

  • Data not explicitly generated for research
  • Always on
  • Numbers, text, images, audio, video
  • Data from
    • Online activity (digital communication etc.)
    • Smartphone usage (calling, filming, walking etc.)
    • Administrative registries
    • Payments
    • Smart devices
    • Video surveillance

  • Publicly owned individual data

  • Can be linked using social security numbers

The Swiss federation and cantons store data about all of life’s aspects
The Swiss federation and cantons store data about all of life’s aspects

2.6 Exercise

  • Develop an idea how data science could be used in social work based on the file “Use cases in social work” together with Copilot/ChatGPT
  • Use the following prompt:

You are ChatGPT, and your task is to help me develop a practical example of how data science could be applied in social work. The goal should be an example that is highly useful. Use the file from Dorian Kessler on potential use cases as a reference. Guide me through targeted questions to understand my work context or area of interest and suggest the most relevant application.

Conversation steps:

Understand the context:

Ask me:

  • “Are you currently working in social work? If not, what area interests you most?”

  • “Who are the clients or groups you work with or aim to work with?”

  • “What are common tasks in this field?”

  • “What are the three most pressing problems in your field?”

  • “What data is currently available or could be collected to improve workflows?”

Suggest solutions:

Based on my answers and Dorian Kessler’s file, propose 1–2 realistic examples of how data science could address challenges or improve processes. Briefly explain the benefits.

Get feedback and refine:

Ask:

  • “Does this idea seem relevant and practical for your field? How could it be adjusted to fit better?”

Refine the example with my input and help me select the best option to share with my peers.

  • Post your final ideas on this padlet

2.7 Course plan

3 Measuring the effects of social work

3.1 Why is it important to measure the effects of social work?

Improving practice with better knowledge

3.2 What is an effect and what not?

  • Effect = difference in the result with influencing variable versus without influencing variable (= counterfactual situation)

  • What is the counterfactual situation?

    • The fictional world in which the influencing variable was not present.
  • Exercise
    • Talk to the person sitting next to you.
      • What was the most important event in your life (family, education, work, health, social relationships)?
      • What areas of your life have been affected by this event?
      • What would these areas be like if the event had not happened (can you guess numbers)?
  • Example of effect measures in social work

3.3 How can we measure the effects of social work with quantitative data?

3.3.1 Asking experts

  • Asking individuals about the subjectively measured effect

  • Example: “On a scale from 0 to 10, how much does one daily glass of wine affect your health?”

  • Advantages

    • Easy to measure: one question

    • Subjective expertise: we know a lot about effects (e.g. pain killers)

  • Disadvantages

    • We are unaware of the counterfactual

    • Social desirability bias: we want to please the researcher

3.3.2 Assessing correlations

  • Is there a systematic relationship between two dimensions?

  • Example: wine consumption and dementia

  • Advantages

    • Easy to measure: few questions
      • Wine consumption
      • Dementia symptoms
  • Disadvantages

    • Often: correlation is not equal to causation
    • Why do frequent wine drinkers have less dementia?

3.3.3 Experiments - the gold standard

  • Advantages:

    • Secure statements on causality

    • Control over treatment

  • Disadvantages:

    • Ethical problems

    • High financial and administrative burden

    • Often limited generalizability

    • Low variance (often only two manifestations: treatment vs. no treatment)

    • Social desirability (except in double-blind studies with placebo)

3.3.4 Natural experiments

  • A random event/dimension (Z) influences independent variable (X) but not the outcome (Y)

3.3.5 Exercise

  • Form four groups: one for each method to measure effects

  • Imagine this: you want to find out how meetings with social workers affect client well-being

  • Please define a research design according to your method of effect measurement

    • What data would you analyze?

    • What numbers would you calculate to measure the effect?

4 Prediction and AI in social work

4.1 Why should we use machines to predict in social work?

  • Prediction is an integral part of individual-level social work

    • It is used for diagnosis

      • Identifying clients’ need for assistance

      • The future developments of clients’ outcomes without assistance is an integral part of diagnosis

      • Based on predictions of future outcomes, we decide which clients need our help most

    • It is also used for treatment

  • On an aggregate level, we need to predict future need for services to ensure mobilization of adequate resources (i.e. asking for more funding)

  • Social workers, like all humans, make mistakes when predicting future developments.

  • Machines can help us predict outcomes more accurately, that’s why we can call it artificial intelligence

    • Helpfulness of machine predictions increase, the more data we have

4.2 How do machine predictions work?

  • Basic technology: supervised machine learning

  • We need (a lot of) data about outcomes and determinants of these outcomes

    • “Supervised”, because we tell the computer what the outcome is and what the determinants are
  • Using prediction algorithms, the computer finds rules linking determinants to outcomes. These rule sets are a model.

  • There are simple and less simple prediction algorithms

  • Model is used to predict unknown outcomes with information on determinants

  • Prediction models are more useful for social work practice,

    • the more precise machines can predict the outcome, and the less biased they are.

    • the clearer it is what can be done to prevent the outcome

    • the more important it is to intervene early

4.3 Examples

4.4 How to train your own prediction model

  • Training a prediction model involves the following steps

    • Acquiring the data with past observations of determinants and outcomes

    • Split observations into training data and test data

    • Maximizing predictive performance

      • Measures of predictive performance

        • Continuous outcomes

          • usually mean squared error
        • Categorical outcomes

          • correctly classified
      • Play around with the choice of prediction algorithm

      • For algorithms that have parameters: play around

5 Kompetenznachweis

  • You will analyze one of the following data sets and research questions

  • Structure

    • Einleitung: Vorstellung der Fragestellung und ihrer Relevanz für die Soziale Arbeit

    • Methodik: Dokumentation dessen, welche Daten verwendet und wie sie ausgewertet wurden

    • Resultate: Präsentation der Resultate

    • Schlussteil: Diskussion und Interpretation der Resultate mit Bezug zum Gegenstand und Auftrag der Sozialen Arbeit

  • Die Studierenden liefern zudem ein R-Code File mit, in welchem die Aufbereitungs- und Auswertungsschritte festgehalten sind. Das Code-File muss reproduzierbar sein und die verwendeten Resultate herstellen.

  • Der Kompetenznachweis (Dokumentation, R-Code) wird in Gruppen von 2-3 Personen verfasst, verfügt jedoch über individuell verantwortete Teile im Text oder im Codefile (z.B. im Text: Einleitung, Methodik, Resultate, Schlussteil; im Code: Aufbereitung und Auswertung). Die individuellen Beiträge sind am Ende der Dokumentation als solche auszuweisen (Angabe der Kapitel; für Code: Angabe der Zeilennummern).

6 Introduction to R

6.1 General Information about R

  • R is free and open source.

  • R has an array of powerful statistical methods.

  • All additional tools can freely downloaded, installed and loaded as so called packages.

  • With ggplot2 R allows you to create beautiful figures.

  • With the tidyverse and dplyr, R has the simplest language for data preparation.

  • R is more than just statistical software (cf. shiny).

  • R is well known by ChatGPT.

6.2 Data Science workflow

# Installiere benötigte Pakete, falls sie noch nicht installiert sind
required_packages <- c("readxl", "dplyr", "tidyr", "ggplot2", "officer", "flextable")
installed_packages <- installed.packages()

for(pkg in required_packages){
  if(!(pkg %in% rownames(installed_packages))){
    install.packages(pkg)
  }
}

# Lade die Pakete
library(readxl)
library(tidyverse)
library(ggplot2)
library(officer)
library(flextable)

#Working directory festlegen
setwd("C:/Users/kld1/Downloads/")


# 1. Excel-Datei herunterladen
#https://www.pxweb.bfs.admin.ch/pxweb/de/px-x-1304030000_134/-/px-x-1304030000_134.px/table/tableViewLayout2/

url <- "https://www.pxweb.bfs.admin.ch/sq/ecfd5274-e21f-4d26-9bcf-5326af3edc9a"
destfile <- "sozialhilfe.xlsx"

download.file(url, destfile, mode = "wb")

# 2. Daten einlesen und aufbereiten

# Lese das Excel-Sheet ein (falls mehrere Sheets vorhanden sind, ggf. das richtige auswählen)
# Hier wird angenommen, dass die Daten im ersten Sheet sind
raw_data <- read_excel(destfile, sheet = 1, skip =2)  # Überspringe die ersten 5 Zeilen, die Metainformationen enthalten

#Aufbereiten: Spalten auswählen, umbenennen, Zeilen auswählen
data <- raw_data%>%
  select(Kanton='...2',contains("20"))%>%
  filter(!is.na(Kanton),Kanton %in% c("Bern / Berne","Zürich","Basel-Stadt","Genève"))

# Transformiere die Daten von Wide zu Long Format
long_data <- data %>%
  pivot_longer(
    cols = `2009`:`2022`,
    names_to = "Jahr",
    values_to = "Anzahl"
  ) %>%
  mutate(Jahr = as.integer(Jahr),
         Anzahl = as.numeric(Anzahl))

# 3. Grafik erstellen mit ggplot2

# Erstelle eine schöne ggplot-Grafik
plot <- ggplot(long_data, aes(x = Jahr, y = Anzahl, color = Kanton)) +
  geom_line(size = 1) +
  theme_minimal() +
  labs(
    title = "Anzahl Sozialhilfebeziehende pro Kanton (2009-2022)",
    x = "Jahr",
    y = "Anzahl Sozialhilfebeziehende",
    color = "Kanton"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
  )

# Speichere die Grafik als Bild, um sie in Word einzufügen
ggsave("sozialhilfe_plot.png", plot = plot, width = 12, height = 8, dpi = 300)

# 4. Grafik in ein Word-Dokument einfügen

# Erstelle ein neues Word-Dokument
doc <- read_docx()

# Füge einen Titel hinzu
doc <- doc %>%
  body_add_par("Anzahl der Sozialhilfebeziehenden pro Kanton (2009-2022)", style = "heading 1")

# Füge die Grafik hinzu
doc <- doc %>%
  body_add_img(src = "sozialhilfe_plot.png", width = 6, height = 4, style = "centered")

# Optional: Füge eine Tabelle mit den Daten hinzu
# Erstelle eine Beispieltabelle (hier die ersten 10 Zeilen)
table_data <- long_data %>%
  filter(Kanton %in% c("Bern / Berne","Zürich","Basel-Stadt","Genève"))  

ft <- flextable(table_data) %>%
  # Automatically adjust column widths to fit content
  autofit() %>%
  # Set table width to 100% of the document width
  width(j = 1:3, width = 1.5) %>%  # Adjust individual column widths if necessary
  set_table_properties(width = 1, layout = "autofit") %>%
  # Optional: Enhance table aesthetics
  theme_box() %>%
  fontsize(size = 10, part = "all") %>%
  bold(part = "header")  # Bold the header row

# Füge die Tabelle hinzu
doc <- doc %>%
  body_add_par("Beispieltabelle der Daten", style = "heading 2") %>%
  body_add_flextable(ft)

# Speichere das Word-Dokument
print(doc, target = "Sozialhilfe_Report.docx")
  • RStudio Environment

    • Console Window
    • Source Editor (Syntax window)
    • File Window, Plot Window
    • Environment Window, History Window

6.3 AI coding assistants

  • ChatGPT and other frontier Large Language Models know R pretty well (Copilot 22nd rank also works, but the newest ChatGPT 1st rank is more able)

  • After you ask a question, tell ChatGPT how your data look like. If you have no sensitive data, just paste the data in to show ChatGPT the structure. If you have sensitive data, just paste the header (= variable names)

  • Paste the resulting code back into the R-Script and run the code

  • If you have errors, paste the error (from the console) back into ChatGPT and tell it to solve the problem.

  • Tell ChatGPT to only give you relevant code, if you adapt parts your overall code.

  • If it doesn’t comment code, ask to comment and explain what each piece of code does.

6.4 Excercise

  • Open a new R script

  • Copy the code above into the R Script

  • Copy the code above into ChatGPT or Copilot

  • Ask it to assist you with the following tasks:

    • Exercise 1: Add Luzern and Waadt to the plot and table
    • Exercise 2: Make the plot more beautiful by adding dots to the lines and by making sure every year is displayed on the x-Axis
    • Exercise 3: Develop a bar chart that displays the number of social assistance recipients for each canton for the year 2022 ordered by number of recipients. Ensure the chart includes appropriate titles and axis labels. Also ask it to label the bars with the values (with vertical alignment). Save the bar chart as a separate PNG file.

6.5 Reading in data

  • R allows you to read in data in all formats

  • The most common data storage format are excel tables. You can open them with the readxl package.

  • The most common data storage format is csv (comma separated values)

  • The best way to deal with large data are the data.table (to read in csv-data) and arrow packages (to save and read in large data)

#Set the working directory. Here we use the download folder

setwd("C:/Users/kld1/Downloads/")

#Download data to the folder by hand

#Büro: https://drive.switch.ch/index.php/s/gdNYHopxWDCV9hr
#Turnhalle: https://drive.switch.ch/index.php/s/am1T36ehPL24QuQ

#Install and load excel package
install.packages("readxl")
library(readxl)

#Read in data from the working directory

Buero <- read.excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read.excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")

#Funktion von range und col_names = FALSE aufzeigen

6.6 Looking at data

  • RStudio allows you to manually scroll through data

  • This helps you better understand what is going on

#You can either click on the object or...

#use View()
View(Buero)
View(Turnhalle)

#Or even fix data (never do this!)
fix(Buero)

6.7 Exercise: reading in data and looking at it

  1. Goal: read in data necessary for measuring the change of gender composition after introduction of mixed gender youth club in Summer 2023.
  2. For each file, what are the column names under which you find information on the date of the attendance, the number of attendees and the age and gender composition of the attendees?
Object name data should be saved with Year Sheet to read in Source Link
Maedels_22 2022 Statistikdaten 2024 OJAOffice_Statistikdaten_Moditrff.xlsx
Maedels_23 2023 Moditräff Statistik OJA Angebote Burgdorf 2023.xlsx
Jungs_22 2022 Gieleträff OJAOffice_Statistikdaten_Gieltrff.xlsx
Jungs_23 2023 Gieleträff, range=“A11:B11”,col_names = FALSE Statistik OJA Angebote Burgdorf 2023.xlsx
JuBu_23 2023 JuBU Träff 5&6 Statistik OJA Angebote Burgdorf 2023.xlsx
JuBu_24 2024 Statistikdaten 2024 Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx

6.8 Simple manipulation of one data frame with dplyr

6.8.1 General

  • package dplyr by Hadley Wickham/Romain Francois offers a toolset for data preparation
  • See the dplyr vignette and the Data Wrangling Cheat Sheet for a very good overview
  • filter(): selects a subset of rows (see also slice())
  • arrange(): sorts
  • select(): selects columns
  • mutate(): creates new columns
  • summarize(): aggregates (collapses) data to individual data points
  • distinct(): removes duplicate values
  • group_by(): defines subgroups in the data so that mutate() and summarize() can be applied separately per group.
  • dplyr can be used very well together with so-called piping, i.e. the data object is passed from function to function by %>%, which makes the code much easier to read and more compact.

6.8.2 Example: Package dplyr

# load data

6.8.3 Exercise: Package dplyr

  • Read the SHP data into RStudio.

  • Restrict the data set for the year 2022 to people who are 25 years or older. Familiarize yourself a little with the data (e.g. head(), summary(),table()).

  • Look at the variables with the information on age (variable AGE), gender (variable SEX), years of education (variable EDYEAR) and first nationality (NAT_1_).

  • Create a crosstab with the variables EDYEAR and SEX.

  • calculate mean and standard deviation for the variable age for men and women.

6.8.4 Solution: Package dplyr

6.9 Merging and reshaping data

6.9.1 Rbind, cbind

6.9.2 Merge

6.10 Dealing with text

6.10.1 Regular expressions –> ChatGPT

6.10.2 Grepl

6.10.3 Gsub

6.11 Univariate analysis

6.11.1 Frequencies and distributions

6.11.2 Mean, median, mode

6.12 Analysing associations

6.12.1 Grouped mean

6.12.2 Linear models

6.13 Nice tables

6.13.1 Huxtable

6.14 Nice graphs

6.14.1 Ggplot2

6.15 Workflow

  • Save graphs as png and link them into word

  • Save tables as docx and link them into word